quotation attribution
Identifying Speakers and Addressees of Quotations in Novels with Prompt Learning
Yan, Yuchen, Zhao, Hanjie, Zhu, Senbin, Liu, Hongde, Zhang, Zhihong, Jia, Yuxiang
Quotations in literary works, especially novels, are important to create characters, reflect character relationships, and drive plot development. Current research on quotation extraction in novels primarily focuses on quotation attribution, i.e., identifying the speaker of the quotation. However, the addressee of the quotation is also important to construct the relationship between the speaker and the addressee. To tackle the problem of dataset scarcity, we annotate the first Chinese quotation corpus with elements including speaker, addressee, speaking mode and linguistic cue. We propose prompt learning-based methods for speaker and addressee identification based on fine-tuned pre-trained models. Experiments on both Chinese and English datasets show the effectiveness of the proposed methods, which outperform methods based on zero-shot and few-shot large language models.
A Realistic Evaluation of LLMs for Quotation Attribution in Literary Texts: A Case Study of LLaMa3
Michel, Gaspard, Epure, Elena V., Hennequin, Romain, Cerisara, Christophe
Large Language Models (LLMs) zero-shot and few-shot performance are subject to memorization and data contamination, complicating the assessment of their validity. In literary tasks, the performance of LLMs is often correlated to the degree of book memorization. In this work, we carry out a realistic evaluation of LLMs for quotation attribution in novels, taking the instruction fined-tuned version of Llama3 as an example. We design a task-specific memorization measure and use it to show that Llama3's ability to perform quotation attribution is positively correlated to the novel degree of memorization. However, Llama3 still performs impressively well on books it has not memorized nor seen. Data and code will be made publicly available.
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Asia > Singapore (0.04)
- (10 more...)
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (0.68)
Improving Quotation Attribution with Fictional Character Embeddings
Michel, Gaspard, Epure, Elena V., Hennequin, Romain, Cerisara, Christophe
Humans naturally attribute utterances of direct speech to their speaker in literary works. When attributing quotes, we process contextual information but also access mental representations of characters that we build and revise throughout the narrative. Recent methods to automatically attribute such utterances have explored simulating human logic with deterministic rules or learning new implicit rules with neural networks when processing contextual information. However, these systems inherently lack \textit{character} representations, which often leads to errors on more challenging examples of attribution: anaphoric and implicit quotes. In this work, we propose to augment a popular quotation attribution system, BookNLP, with character embeddings that encode global information of characters. To build these embeddings, we create DramaCV, a corpus of English drama plays from the 15th to 20th century focused on Character Verification (CV), a task similar to Authorship Verification (AV), that aims at analyzing fictional characters. We train a model similar to the recently proposed AV model, Universal Authorship Representation (UAR), on this dataset, showing that it outperforms concurrent methods of characters embeddings on the CV task and generalizes better to literary novels. Then, through an extensive evaluation on 22 novels, we show that combining BookNLP's contextual information with our proposed global character embeddings improves the identification of speakers for anaphoric and implicit quotes, reaching state-of-the-art performance. Code and data will be made publicly available.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- North America > Dominican Republic (0.04)
- (19 more...)
Distinguishing Fictional Voices: a Study of Authorship Verification Models for Quotation Attribution
Michel, Gaspard, Epure, Elena V., Hennequin, Romain, Cerisara, Christophe
Recent approaches to automatically detect the speaker of an utterance of direct speech often disregard general information about characters in favor of local information found in the context, such as surrounding mentions of entities. In this work, we explore stylistic representations of characters built by encoding their quotes with off-the-shelf pretrained Authorship Verification models in a large corpus of English Figure 1: Example of quotation attribution on an excerpt novels (the Project Dialogism Novel Corpus). of Pride and Prejudice by Jane Austen (1813). Results suggest that the combination of stylistic Underlined text are identified mentions, and arrows link and topical information captured in some quotes to their relevant entity mention (solid arrows are of these models accurately distinguish characters explicit references and dashed arrows are anaphoric references).
- North America > Dominican Republic (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Asia > China > Hong Kong (0.04)
- (11 more...)
Improving Automatic Quotation Attribution in Literary Novels
Vishnubhotla, Krishnapriya, Rudzicz, Frank, Hirst, Graeme, Hammond, Adam
Current models for quotation attribution in literary novels assume varying levels of available information in their training and test data, which poses a challenge for in-the-wild inference. Here, we approach quotation attribution as a set of four interconnected sub-tasks: character identification, coreference resolution, quotation identification, and speaker attribution. We benchmark state-of-the-art models on each of these sub-tasks independently, using a large dataset of annotated coreferences and quotations in literary novels (the Project Dialogism Novel Corpus). We also train and evaluate models for the speaker attribution task in particular, showing that a simple sequential prediction model achieves accuracy scores on par with state-of-the-art models.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > Dominican Republic (0.04)
- (2 more...)